Challenges of Urdu Named Entity Recognition: A Scarce Resourced Language

نویسندگان

  • Saeeda Naz
  • Iqbal Umar
  • Hamad Shirazi
  • Sajjad Ahmad Khan
  • Ali Khan
چکیده

In this study, we present a brief overview of Named Entity Recognition (NER) system, various approaches followed for NER systems and finally NER systems for Urdu language. Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. Research against NER systems in Urdu language is at infancy stage therefore the focus of this study is on challenges and peculiarities of Urdu NER system. In this study we also explore the previous work done on NER systems for South and South East Asian Languages (SSEAL). Finally, we conclude the existing work in Urdu NER which is a scarce resourced and morphologically rich language and other SSEAL which have similar features to Urdu language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

N-gram and Gazetteer List Based Named Entity Recognition for Urdu: A Scarce Resourced Language

Extraction of named entities (NEs) from the text is an important operation in many natural language processing applications like information extraction, question answering, machine translation etc. Since early 1990s the researchers have taken greater interest in this field and a lot of work has been done regarding Named Entity Recognition (NER) in different languages of the world. Unfortunately...

متن کامل

A Hybrid Approach for NER System for Scarce Resourced Language-URDU: Integrating n-gram with Rules and Gazetteers

We present a hybrid NER (Name Entity Recognition) system for Urdu script by integration of n-gram model (unigram and bigram), rules and gazetteers. We used prefix and suffix characters for rule construction instead of first name and last name lists or potential terms on the output list that is produced by n-gram model. Evaluation of the system is performed on two corpora, the IJCNLP NE (Named E...

متن کامل

Rule-Based Named Entity Recognition in Urdu

Named Entity Recognition or Extraction (NER) is an important task for automated text processing for industries and academia engaged in the field of language processing, intelligence gathering and Bioinformatics. In this paper we discuss the general problem of Named Entity Recognition, more specifically the challenges in NER in languages that do not have language resources e.g. large annotated c...

متن کامل

Named Entity Recognition System for Urdu

Named Entity Recognition (NER) is a task which helps in finding out Persons name, Location names, Brand names, Abbreviations, Date, Time etc and classifies them into predefined different categories. NER plays a major role in various Natural Language Processing (NLP) fields like Information Extraction, Machine Translations and Question Answering. This paper describes the problems of NER in the c...

متن کامل

A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language

Stemming is a procedure that conflates morphologically related terms into a single term without doing complete morphological analysis. Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. The core tool of information retrieval (IR) is a Stemmer which reduces a word to its stem form. Due to the diverse nature of Urdu, developing its ste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015